2/9/23
Q: When is the presentation?
A: Discussing this today! It’s at the end of the quarter. While written reports will be completed throughout the rest of the quarter as we do case studies, an oral presentation will be part of your final project. These will be able to be recorded or given live in person during finals week.
Q: Will we have another lab with as many questions as Lab 4? The turnaround was pretty stressful, so just want to be prepared.
A: The next lab (multiple linear regression) is also a tady lengthy, but after that I don’t plan on the rest being quite as long. Just as a reminder that you do not need to complete the entire lab to receive credit!
Q: I’m not really comfortable with log transformations yet. Will we get more practice on that?
A: Yup! Last lecture was a first introduction. We’ll return to this in upcoming case studies. The midterm does not require any transformations.
dplyr, viz/ggplot2, and linear regression/tidymodelsGenerate a visualization that will allow readers to determine whether male or female penguins are larger (by mass).
Generate a barplot that visualizes how many penguins there are from each species on each island. Each island should be a different panel (in a 1 row x 3 columns visualization), and each chart should visualize the species count.
Generate a scatterplot that will allow the viewer to determine whether flipper length has differed over time. Be sure to color the points on this plot by species.
Imitation is the highest form of flattery
Example from: Eric Ko
# Eric890916
chessData <- data.frame(country = c("United States", "Germany", "Canada", "Spain", "Russia", "France", "Bosnia and Herzegovina", "Croatia", "Turkey", "Austria"),
num = c(89, 55, 44, 41, 36, 34, 32, 32, 31, 29))
ggplot(chessData, aes(y = reorder(country, num), x = num)) +
geom_col(fill = "#008080") +
geom_text(aes(label = num), hjust = 1, nudge_x = -.5) +
labs(title = "More players transfer to the U.S. than to any other country",
subtitle = "Nations that received the highest number of player transfers, 2000-17",
caption = "2017 data as of April 11. SOURCE: FIDE",
x = "NUMBER OF TRANSFERS", y = "COUNTRY")Example from: Christine Kwon
common_first_names <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/most-common-name/new-top-firstNames.csv")
# editing data
common_first_names <- common_first_names[1:20, ]
common_first_names <- common_first_names %>%
mutate(sex = case_when (name == "Mary" |
name == "Jennifer" |
name == "Patricia" |
name == "Linda" |
name == "Elizabeth" ~ "female",
name != "Mary" |
name != "Jennifer" |
name != "Patricia" |
name != "Linda" |
name != "Elizabeth" ~ "male",),
percentage = round(newPerct2013 * 1000, digits = 1))
# creating visualization
common_first_names %>%
ggplot(aes(y = reorder(name, percentage), x = percentage, fill = sex)) +
geom_histogram(stat = "identity") +
guides(fill = "none") +
annotate("text", x = 9.65, y = 21.7, label = expression(bold("MALE")), cex = 3.85, hjust = 1, vjust = 1, color = "dodgerblue") +
annotate("text", x = 11.5, y = 21.7, label = expression(bold("FEMALE")), cex = 3.85, hjust = 1, vjust = 1, color = "gold1") +
geom_text(aes(label = signif(percentage)), nudge_x = 0.5) +
labs(title = "Most Common First Names",
subtitle = "Per 1,000 Americans as of 2013") +
scale_fill_manual(values = c("male" = "dodgerblue",
"female" = "gold1")) +
theme_classic() +
theme(plot.title.position = "plot",
panel.grid.major.y = element_blank(),
plot.title = element_text(size = 16,
face = "bold"),
plot.subtitle = element_text(size = 11),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.line.x = element_blank(),
axis.line.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(color = "black"),
axis.title.x = element_blank(),
axis.title.y = element_blank())Example by: Cheng Chang (FA21)
# get data
poll <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/covid-19-polls/master/covid_approval_polls_adjusted.csv")
poll_mean <- read_csv("https://raw.githubusercontent.com/fivethirtyeight/covid-19-polls/master/covid_approval_toplines.csv")
poll <- poll |>
filter(subject == "Biden", party != "all") |>
mutate(Party=case_when(party == "D" ~ "Democrats",
party == "I" ~ "Independents",
party == "R" ~ "Republicans")) |>
mutate(enddate=as.Date(enddate, format="%m/%d/%Y"))
poll_mean <- poll_mean |>
filter(subject == "Biden", party != "all") |>
mutate(Party=case_when(party == "D" ~ "Democrats",
party == "I" ~ "Independents",
party == "R" ~ "Republicans")) |>
mutate(modeldate=as.Date(modeldate, format="%m/%d/%Y"))
ggplot() +
geom_point(data=poll,
aes(x=enddate, y=approve_adjusted, color=Party),
size=1,
alpha = 0.5) +
geom_path(data=poll_mean, aes(x=modeldate, y=approve_estimate, color=Party)) +
labs(title="Approval of Biden’s response varies widely by party",
subtitle=
"A calculation of the share of Democrats, Republicans and independents who approve of the president’s\nhandling of the coronavirus outbreak",
x=NULL,
y=NULL) +
scale_color_manual(values = c("Democrats" = "#2acaea",
"Independents" = "#ce7e00",
"Republicans" = "#f44336")) +
theme(plot.title.position = "plot",
panel.grid.major = element_line(color="grey"),
panel.border = element_rect(fill=NA, color="grey"),
panel.background = element_rect(fill="white"))Take a Sad Plot & Make It Better
Example from: Christine Kwon
medals <- tibble(
country = c(
rep("USA", 79), rep("CHN", 70), rep("ROC", 53), rep("GBR", 48), rep("JPN", 40)),
medal_type = c(
rep("gold", 25), rep("silver", 31), rep("bronze", 23),
rep("gold", 32), rep("silver", 22), rep("bronze", 16),
rep("gold", 14), rep("silver", 21), rep("bronze", 18),
rep("gold", 15), rep("silver", 18), rep("bronze", 15),
rep("gold", 21), rep("silver", 7), rep("bronze", 12)))
# creating visualization
medal_viz <- medals %>%
mutate(country = factor(country, levels = c("JPN", "GBR","ROC", "CHN", "USA"))) %>%
ggplot(aes(y = country, fill = factor(medal_type, levels = c("bronze", "silver", "gold")))) +
geom_bar() +
annotate("text", x = 4.5, y = 5.05, label = "25", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 30.5, y = 5.05, label = "31", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 61.5, y = 5.05, label = "23", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 86.5, y = 5.05, label = expression(bold("79")), cex = 5, hjust = 1, vjust = 1) +
annotate("text", x = 4.5, y = 4.05, label = "32", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 37.5, y = 4.05, label = "22", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 58.5, y = 4.05, label = "16", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 76.5, y = 4.05, label = expression(bold("70")), cex = 5, hjust = 1, vjust = 1) +
annotate("text", x = 4.5, y = 3.05, label = "14", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 18.5, y = 3.05, label = "21", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 39.5, y = 3.05, label = "18", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 59.5, y = 3.05, label = expression(bold("53")), cex = 5, hjust = 1, vjust = 1) +
annotate("text", x = 4.5, y = 2.05, label = "15", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 19.5, y = 2.05, label = "18", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 38, y = 2.05, label = "15", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 54.5, y = 2.05, label = expression(bold("48")), cex = 5, hjust = 1, vjust = 1) +
annotate("text", x = 4.5, y = 1.05, label = "21", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 23.5, y = 1.05, label = "7", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 32.5, y = 1.05, label = "12", cex = 4, hjust = 1, vjust = 1) +
annotate("text", x = 46.5, y = 1.05, label = expression(bold("40")) , cex = 5, hjust = 1, vjust = 1) +
labs(title = "Medals Won at the Tokyo Olympics (ongoing)",
subtitle = "Distribution of medals won by the top 5 countries (ordered by total)",
fill = "Medal Type") +
scale_fill_manual(values = c("gold" = "gold",
"silver" = "gray75",
"bronze" = "tan3")) +
theme(#legend.title = element_text(face = "bold"),
legend.position = "top") +
guides(fill = guide_legend(title.position = "top")) +
theme_classic() +
theme(plot.title.position = "plot",
panel.grid.major.y = element_blank(),
plot.title = element_text(size = 16,
face = "bold"),
plot.subtitle = element_text(size = 11),
axis.ticks.x = element_blank(),
axis.ticks.y = element_blank(),
axis.line.x = element_blank(),
axis.line.y = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(color = "black",
#face = "bold",
size = 11),
axis.title.x = element_blank(),
axis.title.y = element_blank())
medal_viz +
theme(legend.position = c(0.8, 0.25))tidyverseFor each case study (2), during lecture:
For each case study:
With your group, you will:
You’ll need to do something more on the topic beyond what is presented in class.
Examples:
Graded on:
Two possible Paths:
You get to choose:
…probably not
But, you should start thinking about/getting a group of 3-4 people together.
I’d recommend you start planning/working on your final project around wk 8